Appendix B: Point Emission Data
5.9 Overview
In this tutorial, we will demonstrate how to represent PM2.5 point emission data on maps. As an example, we will look at the PM2.5 data for four states - Illinois, Indiana, Wisconsin, and Michigan. The goal of this exercise is to create a point data map that provides a clear visualization of the point source of PM2.5 emission. To summarize, our objectives are to:
- Gain famililarity with the pollution data from 2014 National Emissions Inventory
- Perform simple data manipulation on the PM 2.5 data
- Visualize PM 2.5 pollution using the “tmap” package
5.10 Environment Setup
Input/Output
The files that will be used in this tutorial are the pollution data for Illinois, Indiana, Wisconsin, and Michigan as well as he shapefile of the four states. The files are available for download on this project’s GitHub repo.
Load Libraries
We start by loading the necessary packages - tidyverse, sf, and tmap:
tidyverse: to conduct basic statistical analysessf: to perform simple spatial data manipulation.tamp: to create spatial data visualization
library(tidyverse)
library(sf)
library(tmap)Load Data
Besides the packages, we also need to load our data, which can be done by running the commented-out code below. We call the data frame pe1.
pe1 <- read_csv("process_12345.csv")Note that this file above is unfortunately too large to be uploaded to Github, so we will instead use a pre-processed data set:
We also load the shapefile for the four states:
fourstates <- st_read("./data/FourStates")## Reading layer `FourStates' from data source `/Users/LorenzMenendez/Desktop/OpenAirQ-toolkit/data/FourStates' using driver `ESRI Shapefile'
## Simple feature collection with 4 features and 14 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -92.88943 ymin: 36.9703 xmax: -82.12297 ymax: 48.30606
## geographic CRS: WGS 84
For more detailed description of the data, please refer to the main chapters of this tutorial book. Insert Link!
5.11 Data Manipulation
Once we have our data and the packages ready, we will start the data manipulation process. Since we will only look at the data from Illinois, Indiana, Wisconsin, and Michigan, we can use the filter function to pick out only the four states that we are interested in, and we name the new data frame fourstate.pe1. This is accomplished by the commented-out code below.
This dataset comes from 2014 National Emissions Inventory, which is “a comprehensive and detailed estimate of air emissions of criteria pollutants, criteria precursors, and hazardous air pollutants from air emissions sources.” You can read more about this dataset on this website.
states.abbr <- c("IL", "IN", "WI", "MI")
fourstate.pe <- pe1 %>%
filter(state %in% states.abbr)(Remark: The lines of code above does not need to be run if the data file loaded is four_state.csv. They were written to clean the data from process_12345.csv.)
Next, we turn our focus to the pollutant data. To familiarize ourselves with the variable pollutant desc, we use the unique function to examine all the unique values of this variable.
#Find pollutant names for pm2.5
unique(fourstate.pe$`pollutant desc`)## [1] "Hexane"
## [2] "Toluene"
## [3] "Propionaldehyde"
## [4] "Xylenes (Mixed Isomers)"
## [5] "Benzo[g,h,i,]Perylene"
## [6] "Indeno[1,2,3-c,d]Pyrene"
## [7] "Benzo[b]Fluoranthene"
## [8] "Fluoranthene"
## [9] "Benzo[k]Fluoranthene"
## [10] "Acenaphthylene"
## [11] "Chrysene"
## [12] "Formaldehyde"
## [13] "Benzo[a]Pyrene"
## [14] "2,2,4-Trimethylpentane"
## [15] "Benz[a]Anthracene"
## [16] "Benzene"
## [17] "Lead"
## [18] "Acetaldehyde"
## [19] "Acenaphthene"
## [20] "Phenanthrene"
## [21] "Fluorene"
## [22] "Naphthalene"
## [23] "Carbon Monoxide"
## [24] "Nitrogen Oxides"
## [25] "PM10 Primary (Filt + Cond)"
## [26] "PM2.5 Primary (Filt + Cond)"
## [27] "Sulfur Dioxide"
## [28] "Volatile Organic Compounds"
## [29] "Ethyl Benzene"
## [30] "Styrene"
## [31] "1,3-Butadiene"
## [32] "Acrolein"
## [33] "Anthracene"
## [34] "m-Xylene"
## [35] "Phenol"
## [36] "Methanol"
## [37] "2-Methylnaphthalene"
## [38] "o-Xylene"
## [39] "Cumene"
## [40] "Dibenzo[a,h]Anthracene"
## [41] "PM2.5 Filterable"
## [42] "PM10 Filterable"
## [43] "PM Condensible"
## [44] "Carbon Dioxide"
## [45] "Methane"
## [46] "Nitrous Oxide"
## [47] "Cadmium"
## [48] "Hexamethylene Diisocyanate"
## [49] "Ammonia"
## [50] "Glycol Ethers"
## [51] "Manganese"
## [52] "Methylene Chloride"
## [53] "Chromium III"
## [54] "Chromium (VI)"
## [55] "Nickel"
## [56] "Arsenic"
## [57] "Cobalt"
## [58] "Mercury"
## [59] "Antimony"
## [60] "Selenium"
## [61] "Beryllium"
## [62] "Hydrochloric Acid"
## [63] "Hydrogen Fluoride"
## [64] "Pyrene"
## [65] "Isophorone"
## [66] "Vinyl Chloride"
## [67] "Biphenyl"
## [68] "Tetrachloroethylene"
## [69] "Methyl Chloroform"
## [70] "PAH/POM - Unspecified"
## [71] "2,4-Dinitrophenol"
## [72] "Chlorine"
## [73] "Pentachlorophenol"
## [74] "4-Nitrophenol"
## [75] "Ethyl Chloride"
## [76] "Carbon Disulfide"
## [77] "Ethylidene Dichloride"
## [78] "Propylene Dichloride"
## [79] "Trichloroethylene"
## [80] "1,1,2,2-Tetrachloroethane"
## [81] "Ethylene Dichloride"
## [82] "Acrylonitrile"
## [83] "Methyl Isobutyl Ketone"
## [84] "Chlorobenzene"
## [85] "Carbonyl Sulfide"
## [86] "Carbon Tetrachloride"
## [87] "Chloroform"
## [88] "Dimethyl Phthalate"
## [89] "PAH, total"
## [90] "Acrylic Acid"
## [91] "Polychlorinated Biphenyls"
## [92] "Ethylene Dibromide"
## [93] "1,3-Dichloropropene"
## [94] "Methyl Chloride"
## [95] "Phosphorus"
## [96] "Propylene Oxide"
## [97] "Vinyl Acetate"
## [98] "Methyl Tert-Butyl Ether"
## [99] "Acetophenone"
## [100] "Benzo[e]Pyrene"
## [101] "Perylene"
## [102] "3-Methylcholanthrene"
## [103] "Benzofluoranthenes"
## [104] "p-Xylene"
## [105] "Methyl Bromide"
## [106] "Quinoline"
## [107] "2,4-Dichlorophenoxy Acetic Acid"
## [108] "1-Bromopropane"
## [109] "Allyl Chloride"
## [110] "Ethylene Glycol"
## [111] "Chloromethyl Methyl Ether"
## [112] "Hexachlorobenzene"
## [113] "Triethylamine"
## [114] "2,4,6-Trichlorophenol"
## [115] "Hydrazine"
## [116] "N,N-Dimethylformamide"
## [117] "Acetonitrile"
## [118] "Vinylidene Chloride"
## [119] "Phosgene"
## [120] "Chloroacetic Acid"
## [121] "2,4-Toluene Diisocyanate"
## [122] "Diethanolamine"
## [123] "Methyl Methacrylate"
## [124] "Maleic Anhydride"
## [125] "Hexachloroethane"
## [126] "Cyanide"
## [127] "Phthalic Anhydride"
## [128] "Polycyclic aromatic compounds (includes 25 specific compounds)"
## [129] "Epichlorohydrin"
## [130] "Dimethyl Sulfate"
## [131] "Hydroquinone"
## [132] "Chloroprene"
## [133] "Hydrogen Sulfide"
## [134] "Acrylamide"
## [135] "Cresol/Cresylic Acid (Mixed Isomers)"
## [136] "4,4 -Methylenediphenyl Diisocyanate"
## [137] "Methyl Isocyanate"
## [138] "Tert-butyl Acetate"
## [139] "Bis(2-Ethylhexyl)Phthalate"
## [140] "Ethylene Oxide"
## [141] "1,1,2-Trichloroethane"
## [142] "o-Cresol"
## [143] "Catechol"
## [144] "Sulfur Hexafluoride"
## [145] "2,3,3 ,4,4 ,5/2,3,3 ,4,4 ,5-Hexachlorobiphenyl (PCBs156/157)"
## [146] "2,3,3 ,4,4 -Pentachlorobiphenyl (PCB-105)"
## [147] "2,3 ,4,4 ,5-Pentachlorobiphenyl (PCB118)"
## [148] "3,3 ,4,4 -Tetrachlorobiphenyl (PCB-77)"
## [149] "2,3 ,4,4 ,5,5 -Hexachlorobiphenyl (PCB-167)"
## [150] "2,3,4,4 ,5-Pentachlorobiphenyl (PCB-114)"
## [151] "Methylhydrazine"
## [152] "Benzyl Chloride"
## [153] "1,4-Dichlorobenzene"
## [154] "Bromoform"
## [155] "2-Nitropropane"
## [156] "2-Chloronaphthalene"
## [157] "Quinone"
## [158] "2-Chloroacetophenone"
## [159] "2,4-Dinitrotoluene"
## [160] "Carbazole"
## [161] "Aniline"
## [162] "p-Dioxane"
## [163] "4,6-Dinitro-o-Cresol"
## [164] "Dibutyl Phthalate"
## [165] "1,3-Propanesultone"
## [166] "Benzidine"
## [167] "Dibenzofuran"
## [168] "Ethyl Carbamate"
## [169] "1,2-Epoxybutane"
## [170] "Hydrogen Cyanide"
## [171] "Cellosolve Acetate"
## [172] "Ethyl Acrylate"
## [173] "o-Toluidine"
## [174] "7,12-Dimethylbenz[a]Anthracene"
## [175] "Captan"
## [176] "5-Methylchrysene"
## [177] "Methyl Iodide"
## [178] "Nitrobenzene"
## [179] "Hexachlorobutadiene"
## [180] "Hexachlorocyclopentadiene"
## [181] "1,2,4-Trichlorobenzene"
## [182] "Asbestos"
## [183] "Dichloroethyl Ether"
## [184] "Coke Oven Emissions"
## [185] "Trifluralin"
## [186] "Toxaphene"
## [187] "Carbaryl"
## [188] "Heptachlor"
## [189] "4,4 -Methylenebis(2-Chloraniline)"
## [190] "m-Cresol"
## [191] "Methoxychlor"
## [192] "Phosphine"
## [193] "Calcium Cyanamide"
## [194] "1,2-Propylenimine"
## [195] "Diethyl Sulfate"
## [196] "p-Cresol"
## [197] "Ethylene Thiourea"
## [198] "Chlordane"
## [199] "N,N-Dimethylaniline"
## [200] "p-Phenylenediamine"
## [201] "o-Anisidine"
## [202] "1,2-Diphenylhydrazine"
We see that there are quite a number of different pollutants, but we are primarily interested in the PM2.5 data. Therefore, we use the filter function again to subset the data, retaining only those observations with the pollutant being “PM2.5 Filterable” or “PM2.5 Primary (Filt + Cond)”.
pm25 <- c("PM2.5 Filterable", "PM2.5 Primary (Filt + Cond)")
#Filter for pm2.5
fourstate.pm <- fourstate.pe %>%
filter(`pollutant desc` %in% pm25)Now we will take care of the duplicates and missing values, both of which should be removed from our data. The following line of code eliminates any duplicated values in eis facility id.
#remove duplicates
fourstate.pm.final <- fourstate.pm[!duplicated(fourstate.pm$`eis facility id`),] The following line of code eliminates any missing values in site latitude. We call this cleaned data frame fourstate.pm.final.
#remove na coords
fourstate.pm.final <- fourstate.pm.final[!is.na(fourstate.pm.final$`site latitude`),] The last step in the data manipulation process is to turn our data points into a spatial object, which we can accomplish by using the st_as_sf function. Notice that we use the coordinate reference system 4326, which is the geodetic coordinate system for world. More information about coordinate reference systems can be found on this website.
#Turn into sf object
fourstate.pm.spatial <- st_as_sf(fourstate.pm.final, coords = c("site longitude", "site latitude"), crs = 4326)5.12 Making Point Data Maps
Finally, we are ready to make some maps! The command for generating a point data map is actually quite easy. We just need to specify the shapefile that stores the shape of the states and the point data which we have turned into a spatial object earlier. We adjust the size of the dots to 0.01 so that the pattern of those points is discernible.
tm_shape(fourstates) +
tm_borders() +
tm_shape(fourstate.pm.spatial) +
tm_dots(size = 0.01) The map above is neat, but it only conveys limited information. It is impossible, for example, to tell from the map how much PM2.5 emission each of these dots produces.
To improve the simple map above, we use the tem_bubbles function instead of tm_dots. Within tm_bubbles, we can choose how to classify the PM2.5 data by specifying the style. Some common choices of style include “fisher”, “jenks”, “quantile”, etc. Customization of the color palette is also possible, and there are plenty of online tutorials on this topic.
tm_shape(fourstates) +
tm_borders() +
tm_shape(fourstate.pm.spatial) +
tm_bubbles(col = "total emissions", size = 0.01,
style = "fisher", palette = "Reds") As we can see, the map above is not only more aesthetically pleasing, but it also communicates more information regarding the quantity of PM2.5 that is produced by each site. Sites that generate more PM2.5 are shown in a darker red. With this map, we can identify those sites of heavy PM2.5 emission with great ease by zooming in.
This concludes our tutorial. Following the simple steps above would allow you to create some simple point data maps, which are often a neat and easily interpretable visualization of the spatial data that we seek to analyze.